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Abstract. New York City’s multitentacled subway system was a major disseminator - if not the 
principal transmission vehicle - of coronavirus infection during the initial takeoff of the massive 
epidemic that became evident throughout the city during March 2020. The near shutoff of 
subway ridership in Manhattan - down by over 90 percent at the end of March - correlates 
strongly with the substantial increase in the doubling time of new cases in this borough. Maps of 
subway station turnstile entries, superimposed upon zip code-level maps of reported coronavirus 
incidence, are strongly consistent with subway-facilitated disease propagation. Local train lines 
appear to have a higher propensity to transmit infection than express lines. Reciprocal seeding of 
infection appears to be the best explanation for the emergence of a single hotspot in Midtown 
West in Manhattan. Bus hubs may have served as secondary transmission routes out to the 
periphery of the city. 
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Introduction 

This study tests the hypothesis that New York City’s multitentacled subway system was a 
major disseminator - if not the principal transmission vehicle - of coronavirus infection during 
the initial takeoff of the massive epidemic that became evident throughout the city during March 
2020. We emphasize the correlational nature of our investigation. We cannot point to a definitive 
intervention comparable to the removal of the handle on the Broad Street pump in St. James’s 
parish, advocated by Dr. John Snow, which dramatically shut down a cholera outbreak in mid¬ 
nineteenth century London (Snow 1855). 

Right from the get-go, one might conjecture instead that the public schools were actually 
the fuse that lit the COVID-19 timebomb. This hypothesis is indirectly supported by the key 
roles played by the closures of public schools and the subsequent vaccination of young 
schoolchildren in blunting outbreaks of influenza in mid-twentieth century Japan (Reichert et al. 
2001). While the New York City public school system has educated over 1.1 million students in 
more than 1,700 public schools, the city’s public subway system, we shall soon see, has typically 
chauffeured more than 5 million rides per working day - from Eighth Avenue in Manhattan to 
Euclid Avenue in Brooklyn, from Lexington Avenue in the Bronx, with just one transfer, to 
Forest Hills-71 st Avenue in Queens. 

Numerous recent reviews have focused sharply on the blame for the coronavirus 
calamity. One writer noted, for example, that “the initial efforts of New York officials to stem 
the outbreak were hampered by their confused guidance, unheeded warnings, delayed decisions 
and political infighting.” (Goodman 2020) While our study has some bearing on what future 
steps might be taken to further flatten the curve of the epidemic, our intention here is to stay 
away from name-calling and name-naming. We avoid adversarial language like the plague. 

Reported COVID-19 Cases and Subway Turnstile Entries During March 2020 

Figure 1 simultaneously tracks the daily movements of two variables from March 1 
though April 3, 2020. The pink-filled circles show the numbers of new coronavirus infections 
reported each day by the New York City Department of Health (New York Department of Health 
and Mental Hygiene 2020). For this variable, the vertical axis on the left is rendered on a 
logarithmic scale. That way, a straight-line trend would represent the exponential growth 
typically seen during the initial upsurge of an epidemic where everyone in the population is 
naive to the infectious agent (Harris 2020). 
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Figure 1. Numbers of Newly Diagnosed COVID-19 Cases (Pink Data Points, Left Axis) and Millions of Subway 
Turnstile Entries (Blue Bars, Right Axis), New York City, March 1-April 3, 2020. 


For the same variable of newly reported cases, the horizontal axis at the bottom ticks off 
the date that the coronavirus test was performed. By contrast, in Figure 1 of the first article in 
this series (Harris 2020), we tracked newly reported infections in relation to the date the test 
results were received. The new reporting convention, which has been recently adopted by the 
city’s health department, has the advantage that it cuts out the delay between the date that a 
healthcare worker swabbed a sample from a patient’s nose or throat and the date that the 
laboratory notified the department of the test result. It has the disadvantage, however, that the 
most recent daily counts are unreliable because the department is still waiting for the lab reports 
to come in. 

No matter what convention is employed to mark off the calendar on the horizontal axis, 
the trend in the daily reported incidence of new COVID-19 cases tells the same story. There is a 
rapid upswing during the first half of the month, with a doubling time in Figure 1 of just 1.4 
days, followed by a marked slowing with a doubling time of 19 days. As we’ve earlier discussed, 
there are a number of valid reasons why the numbers of reported cases understate the total 
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number of coronavirus infections. Still, when all of the indicators are viewed together, the 
conclusion that the epidemic curve in New York City has been flattening is inescapable (Harris 
2020 ). 

The second variable tracked in Figure 1 above represents the total numbers of entries 
every day into any of the approximately 4,600 turnstiles located throughout New York City’s 
496 subway stations. These counts are reported each week by the Metropolitan Transportation 
Authority (MTA) (Metropolitan Transportation Authority (MTA) 2020a, b, Whong 2020, 
Wellington 2020). This variable is represented as sky-colored vertical bars, measured in millions 
of entries tallied along the vertical axis on the right side of Figure 1. For this variable, the 
horizontal axis measures the dates on which riders passed through the system’s turnstiles. While 
the MTA also reports turnstile exits, the data do not allow an analyst to link a particular rider’s 
station of entry with that rider’s station of exit. 

Figure 1 shows only the volume of rides from March 1 onward. Still, the counts shown 
during the first full week of the month - from Sunday March 1 through Saturday March 6 - are 
quite typical of the pattern for prior weeks, peaking during mid-week at about 5.5 million rides 
per day and dropping during the weekends (Schneider 2020). During the second week of March, 
however, we begin to see a slight decline in subway usage, overall about 19 percent lower than 
the previous week. This decline in subway use accelerates markedly beginning on Monday 
March 16, the day that New York City Mayor de Blasio issued an order limiting gatherings and 
closing numerous places of congregation. By the third week overall, subway usage is down 68 
percent from the first week in March, and by the fourth week, it’s down 86 percent. 

Simple comparison of the two trends in Figure 1 cannot by itself answer questions of 
causation. The parallel between the continued high ridership on MTA subways and the rapid, 
exponential surge in infections during the first two weeks of March at best supports the 
hypothesis that the subways played a role. While the subsequent plummeting of ridership appears 
likewise to parallel the flattening of the reported incidence curve, the steep fall in the heights of 
the blue bars may just as well represent the public’s response to widespread publicity about the 
ferocity of the outbreak that had been gathering storm for two weeks. As economists say, the 
precipitous drop in subway ridership may well have been endogenous. 
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Subway Ridership by Borough 

Figure 2 focuses more sharply on the trends in subway turnstile entries, breaking down 
the trends by the borough in which the subway station of entry was located. We have included 
the Staten Island railway, which connects to Manhattan via the Staten Island Ferry. The vertical 
axis now measures turnstile entries as a percentage of the volume recorded on Monday, March 2, 
2020. To better appreciate the proportional changes in ridership, the vertical axis is rendered on a 
logarithmic scale. 
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Figure 2. Daily Numbers of Turnstile Entries for the Five Boroughs of New York City, Computed on a Logarithmic 
Scale as a Percentage of Peak Ridership on March 2, 2020 (Corrected). 


During the first week of March, the ridership volumes in the five boroughs, calculated in 
percentage terms, are indistinguishable, except for a greater weekend dropdown in Staten Island. 
As the second calendar week comes to a close, we can begin to see a divergence among 
boroughs, which becomes increasingly prominent over time. By Monday March 23, Manhattan 
ridership has fallen to 10.5 percent of its March 2 volume, as shown by the purple data points, 
and by Monday March 30, it’s down to 7.8 percent of peak. By contrast, Bronx, represented by 
the sky-blue data points, was down to 25.2 percent of peak volume by Monday March 23 and 
20.3 percent of peak by Monday March 30. Staten Island, represented by the mango data points, 
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experienced an even smaller drop in volume. These observations are consistent with a previously 
reported finding that the decline in subway ridership was much larger among those New York 
City census tracts with the highest median income (Wellington 2020). 

For each of the five boroughs, Figure 3 compares the percentage decline in turnstile 
entries from March 2 through March 16, shown on the horizontal axis, against the estimated 
doubling times of new reported COVID-19 cases 15 days later during the week starting on 
March 31. The borough of Manhattan stands out from the other four. By March 16, Manhattan 
turnstile entries had fallen to 65 percent of their March 2 peak. About two weeks later, the trend 
in the number of new reported infections was virtually flat, with a doubling time of 20 days. 
From formulas developed in our earlier report (Harris 2020), it is likely that the reproductive 
number R in Manhattan as a whole is now less than 1. That is, the number of individuals coming 
down with a new coronavirus infection during any given day is outweighed by the number of 
previously infected individuals who lost their infectivity during that same day. 
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Figure 3. Percentage Reduction in Daily Turnstile Entries from March 2 to March 16 Versus the Estimated 
Doubling Time of New Reported COVID-19 Cases During the Subsequent Week from March 31 to April 7. Five 

Boroughs, New York City. 
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The finding that a 65-percent drop subway ridership is associated with a subsequent 
reversal of the COVID-19 epidemic in the borough of Manhattan hardly proves causation. It 
could be that the decline in ridership is no more than an indicator - what economists call a proxy 
- for other concurrent social distancing activities that ultimately contributed to the observed 
decline in reported infections. In any event, it would be inappropriate to draw firm conclusions 
from what would amount to a Manhattan-versus-the-rest study. Still, the analysis points us in the 
direction of a finer, more detailed examination of the relation between trends in subway ridership 
and coronavirus propagation at the geographic level. 

Diversity of COVID-19 Incidence by New York City Zip Code 

Figures 4 and 5, respectively, map the cumulative numbers of COVID-19 cases per 
10,000 population in New York City zip codes at two points in time: March 31 and April 8. In 
each map, we use the same fixed three-class color scheme to characterize the cumulative 
incidence. Light green I I signifies a cumulative incidence rate less than 70 cases per 10,000. 
Medium green I I signifies a rate of at least 70 but less than 100 cases per 10,000. Dark green 
1 I stands for a rate of at least 100 per 10,000, which is equivalent to saying that at least 1 
percent of the population has been infected as of the specified date. These maps were modified 
from published maps depicting the numbers of positive tests, but not incidence (New York 
Department of Health and Mental Hygiene 2020). For an animated GIF, click here . 

Comparison of the two maps, depicting the evolution of the coronavirus epidemic over 
just 9 days, shows the initial seeding and subsequent spread from several distinct hotspots: 
Borough Park (11219) and Midwood (11230) in Brooklyn; Morris Park-Westchester Square 
(10461) in the Bronx; a swath of contiguous zip codes extending eastward from East Elmhurst 
(11370) in Queens; and a hotspot centered around Midtown West (10018) in Manhattan. By 
April 8, the zip code with the highest cumulative incidence was East Elmhurst (11370) with 180 
cases per 10,000 population. 

Looking at the data on subway station-specific turnstile entries and zip code-specific 
infection rates, many economists may see the makings of a difference-in-differences analysis. 

For each station, the idea is first to compute the time trends in turnstile entries and coronavirus 
incidence, and then assesses whether there is a relation between the two trends across different 
subway stations (Fredriksson and Oliviera 2019). Unfortunately, there is a serious problem with 
this extraordinarily popular method of doing policy analysis (Bertrand, Duflo, and Mullainathan 
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2004). In particular, there is likely to be significant serial correlation in the outcomes among 
adjacent subway stations situated along the same line. 


Cumulative Reported 
Coronavirus Infections (X) 
per 10,000 Population in 
Each New York City Zip Code 


□ X < 70 

□ 70 < X< 100 

□ X>100 


March 31,2020 




Figure 4. Map of Cumulative Numbers of Coronavirus Infections per 10,000 Population According to Zip Code of 

Residence, New York City, as of March 31, 2020. 


The problem, put differently, is that the individual subway stations are not 
epidemiologically independent entities. Consider a service worker using public transportation in 
New York City, who typically takes more than a half-hour to commute to work (Choi, 
Velasquez, and Welch 2020). Specifically, she takes the Flushing Local line, entering the 
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turnstile at the Junction Boulevard stop, located within the Corona zip code (11368) in Queens, 
getting off at the 34 th Street-11 th Avenue stop at the end of the line, from which she walks to her 
work in the Midtown West zip code (10018). 



Figure 5. Map of Cumulative Numbers of Coronavirus Infections per 10,000 Population According to Zip Code of 

Residence, New York City, as of April 8, 2020. 

We’ll call our commuter Milagros, a name honoring Nuestra Senora de Los Milagros, 
inasmuch as zip code 11368 is 74% Hispanic-Latino (USZip 2020b). Once Milagros boards the 
train, the next two stops are 90 th Street-Elmhurst Avenue and 82 nd Street-Jackson Heights, 
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smack-dab between zip codes 11372 (Jackson Heights) and Elmhurst (11373), which were 
already emerging hot spots of infection by March 31. From 82 nd St.-Jackson Heights, it would 
take Milagros just five minutes to walk to the Elmhurst Hospital Emergency Department. 

Milagros’s exposure to coronavirus is not accurately gauged by the number of commuters 
who passed through the turnstile at her entry point at Junction Boulevard. That’s because she’ll 
come into contact with potentially infectious passengers at each of the remaining 17 stops until 
she gets off at 34 th Street-11 th Avenue, which happens to be located in another coronavirus 
hotspot. On the way back home, she will also be exposed to those passengers staying on the 
Flushing Local and disembarking after Milagros does - at the 103 rd St-Corona Plaza, 111 th 
Street, and Mets-Willets Point stations likewise located in hotspot zip codes. In view of these 
independencies between units of observation, the classic technique of difference-in-differences 
routinely employed in policy evaluation is, as Milagros would put it, arrojado por la ventana. 

Subway Lines Are the Correct Units of Analysis. 

Figure 6 superimposes the stops along the Flushing Local line that tens of thousands of 
passengers like Milagros took every day back and forth between a station at the eastern end of 
the line in Queens and a station at the western end in downtown Manhattan. 


10282 
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Figure 6. Stops Along the Flushing Local Line in the New York City Subway System Superimposed on a Section of 
the Zip Code Map in Figure 5. The outer area of each point corresponds to the volume of turnstile entries during the 
first week in March 2020, while the inner area corresponds to the volume during the third week of that month. 

The outer area of each circle O corresponds to the volume of turnstile entries at that 
station during the first week in March, while the inner area corresponds to the volume during the 
third week in March. As we would anticipate from the data in Figures 3 and 4, the volume of 
turnstile entries declined to some extent at all of the station stops along the Flushing Local line. 
While the percentage decline was considerably greater at the Manhattan stops, the absolute 
numbers of entries at Grand Central-42 nd Street and Times Square-42 nd Street turnstiles during 
the third week in March were still comparable to those at the other end of the line. 

The data in Figure 6 are compatible with continued but reduced propagation of 
coronavirus infection along the Flushing Local line during the third week of March. The stations 
run through the hot spots in the Elmhurst area and terminate at the hotspot zip code in West 
Midtown Manhattan. The line also runs through Long Island City zip code 11101, another 
hotspot with a 34.5% Hispanic-Latino, 18.5% African-American and 15.9% Asian demographic 
profile, where 71.6 percent of workers take public transportation (USZip 2020a). 

In the classic, static model of epidemic propagation (Harris 2020, Kermack and 
McKendrick 1991), susceptible individuals (the S’s) make contact with infective individuals (the 
I’s ). The incidence of new infections depends on two factors: the frequency of contact between 
an S and an /, and the probability that each contact results in transmission of the infection. The 
model was borrowed from the basic law of mass action in chemistry, where S and / molecules 
bombard against each other, bounding around in a gas or a liquid. In an innovative series of 
papers, Gosce and colleagues generalized this model to consider contagion when the S’s and I’s 
move along a corridor (Gosce, Barton, and Johansson 2014, Gosce and Johansson 2018). They 
applied their framework to the study of the spread of influenza-like illness in the London 
Underground, a vast network opened just nine years after Dr. John Snow got public officials to 
disable a pump at Broad (now Broadwick) and Lexington Streets, now about a five-minute walk 
from the Oxford Circus station. 

The Gosce model offers a number of insights that are immediately applicable to the data 
from the New York City Flushing subway line. The first is that the rate of disease transmission is 
related to the number of trips and average number of stations per trip along the entire subway 
line, and not just to the number of entries at any one subway station. Second, passengers entering 
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the subway line even at a remote, less populous station are slowing down the system, thus 
increasing the transit time that the S’s stay in contact with the I’s. Third, those uninfected S- 
passengers who cram shoulder-to-shoulder into a particular subway are increasing train-car 
density and thus raising the average number of other S-passcngcrs infected by an /-passenger 
who happens to be standing in the middle of the train. Fourth, local trains - like the Flushing 
local - are more likely to seed epidemic infections than express lines. Finally, an entire subway 
line, rather than the individual stations or subway cars, is the appropriate unit of analysis. 

A Bunch of Garbage 

While we’ve got a few more maps up our sleeve, we’re already at a juncture where some 
readers may react with extreme skepticism. We’ve already admitted we don’t have a cleanly 
designed natural experiment. None of Dr. Snow’s successors- He died of a stroke at age 45, four 
years after the handle came off the Broad Street pump. - managed to get the Flushing Local and 
the rest of the MTA abruptly shut down at the end of February. Without such evidence, the 
naysayers will assert that any diffuse, multitentacled network that traverses most of the city 
could be correlated spatially with the spread of coronavirus infection documented above. To be 
sure, serious critics won’t point to the electromagnetic signals from power lines, but they could 
argue that the path traced in Figure 6 could just as well represent the stops of sanitation trucks. 
Put bluntly, the critique goes, the evidence presented thus far would be consistent with 
contaminated garbage as the vehicle for the massive spread of deadly COVID-19. 

Except for one thing - namely, we know that the garbage hypothesis is entirely 
implausible. We know that close contact in subways is fully consistent with the spread of 
coronavirus, either by inhalable droplets or residual fomites left on railings, pivoted grab 
handles, and those smooth, metallic, vertical poles that everyone shares. We know that the 
flattening of the epidemic curve in Manhattan two weeks after that borough had cut its subway 
ridership by 65 percent adds tellingly to the circumstantial evidence. We know that we can’t 
dismiss out of hand our finding of reciprocal seeding from the periphery of the Flushing local 
line to Manhattan’s only hotspot in Midtown West, and from that central hub back to the 
periphery. We know that many workers - especially non-White workers - have been trapped by 
economic necessity into continuing to expose themselves to the bad stuff millions of times daily 
(Goldbaum and Cook 2020). We know that it would be inappropriate to require the subway 
hypothesis to explain every aspect of the diffusion of coronavirus, if only because we have buses 
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and schools, too, if only because Milagros, once she got sick, didn’t have her own bedroom and 
bathroom to isolate herself. 

Overlaying the Other Subway Lines on the Epidemic Map 

Figure 7 superimposes comparable data from the 6 th Avenue Local line (also called the 
Queens Blvd Local line) to the epidemic map of Figure 6. As in the previous figure, the subway 
stops of 6 th Avenue Local run right through the hotspot zip codes. What’s more, the inner circles, 
colored dark blue (§), show a significantly greater decrease in volume in the Manhattan stops by 
the third week in March. These additional data in Figure 7 are further compatible with the 
conclusion that propagation of coronavirus, while reduced in comparison to the first week of 
March, was continuing to spread along subway lines through at least the third week of March. 



Figure 7. Stops Along the Flushing Local Line and 6' h Avenue Local Line in the New York City Subway System 
Superimposed on a Section of the Zip Code Map in Figure 5. The outer area of each point corresponds to the 
volume of turnstile entries during the first week in March 2020, while the inner area corresponds to the volume 

during the third week of that month. 


The last station on the 6 th Avenue Local line is Jamaica - 179 th Street, a major hub for 
local bus routes in Queens (Metropolitan Transportation Authority (MTA) 2018). From there, 
one can take the 43 bus along Hillside Avenue to reach Bellerose Manor (zip code 11426), at the 
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eastern end of the conglomeration of zip code hotspots within the borough shown in Figure 5. 
Alternatively, one can take the ill bus down to Rosedale (zip code 11422) in the southeast 
comer, where 81 percent of residents are African-American (USZip 2020c). 

Following the same conventions as in the two previous figures, Figure 8 overlays 
multiple subway lines on the zip code map of Figure 5. The individual stops for the Staten Island 
line are included, although the MTA database does not provide sufficient data to show the 
changes over time within each station. While Figure 8 does not show every subway line in the 
city, it is intended here to illustrate the breadth and reach of the subway system. 
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Figure 8. Subway Stops Along Multiple Routes in the Four Principal Boroughs of New York City, Superimposed 

Upon the Zip Code Map of Figure 5. See text for details. 

Irony Along Eighth Avenue 

The Metropolitan Transit Authority’s decision to cut back its train service to 
accommodate the reduced demand may have indeed helped to shore up the agency’s financial 
position, but it most likely accelerated the spread of coronavirus throughout the city. That’s 
because the resulting reduction in train service tended to maintain passenger density, the key 
factor driving viral propagation (Goldbaum and Cook 2020). How ironic it is that, from the 
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public health perspective, the optimal policy would have been to double - maybe even triple - 
the frequency of train service. The agency’s decision to convert multiple express lines into local 
service only enhanced the risk of contagion (Goldbaum 2020). How ironic it is that the preferred 
policy would have been to run even more express lines. We have not seen any public data on the 
incremental cost of the agency’s decision to begin to disinfect subway cars twice daily. Still, it is 
natural to inquire why the cars weren’t disinfected every time they emptied out of passengers at 
both ends of the line. 

With the incidence of new infections and COVID-19 hospitalizations leveling off (Harris 
2020), there will be increasing interest in relaxing social distancing measures. During these 
renormalization times, the public transportation system will surely require enhanced scrutiny. 
That means even more attention to staggered work hours, limits on the numbers of passengers 
per transport unit, refurbished vehicles with enhanced ventilation, subsidies for drivers to 
transport workers in SUVs, vans and minibuses, new technologies to determine which stations an 
infected person entered and exited, and redirection of passenger traffic to less dense lines. 

This study has touched upon the differential impact of the COVID-19 pandemic on those 
with the fewest resources. As we put this working paper to press, there have been mounting calls 
for more data on racial and ethnic minorities. How ironic it is that this point was well aired more 
than two decades ago (Fanner 1996). 

Quite apart from the present study and the above-cited work by Gosce and colleagues 
(Gosce, Barton, and Johansson 2014, Gosce and Johansson 2018), a few other researchers have 
attempted to test whether public transport has served as a critical vehicle for the propagation of 
contagious respiratory diseases (Sun et al. 2013, Troko et al. 2011). An overall assessment of 
these research efforts would surely lead a scientific reviewer to conclude that cause-and-effect is 
difficult to prove. Still, we doubt whether any public health practitioner would be reluctant to 
take action on the basis of the facts we now know. 
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